Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
نویسندگان
چکیده
BACKGROUND Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. RESULTS In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. CONCLUSIONS To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the 'best' output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies.
منابع مشابه
Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome
Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo ass...
متن کاملComparisons of De Novo Transcriptome Assemblers in Diploid and Polyploid Species Using Peanut (Arachis spp.) RNA-Seq Data
The narrow genetic base and limited genetic information on Arachis species have hindered the process of marker-assisted selection of peanut cultivars. However, recent developments in sequencing technologies have expanded opportunities to exploit genetic resources, and at lower cost. To use the genetic information for Arachis species available at the transcriptome level, it is important to have ...
متن کاملAccumulation of Taxadiene by Root Culture of Nicotiana Benthamiana Domin Transformed with Taxadiene Synthase
Nicotiana benthamiana was transformed with a taxadiene synthase (TS) gene for the de novo synthesis of taxadiene. The successfully transformed lines of N. benthamiana producing taxadiene was further cultivated for root culture to produce taxadiene in in vitro system. The roots of lines TSS-5, TSS-7, TSS-8 and TSS-10 accumulated 14.6 22.5 μg/g (dw) taxadiene. Methyl jasmonate treatment to the TS...
متن کاملToward a statistically explicit understanding of de novo sequence assembly
MOTIVATION Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copie...
متن کاملDe Novo Transcriptome Sequence Assembly and Analysis of RNA Silencing Genes of Nicotiana benthamiana
BACKGROUND Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient...
متن کامل